Mark scheme: Justified – understanding of the data and their
context
Introduction
Key results
Conclusion References (for reason chosen research question)
Narrative shows critical thinking
Multiple data sources
Recommendations and conclusions are grounded in the data
Limitations of the data set discussed. Next steps suggested in terms of
data that would allow for further analysis.
There are two types of emergency contraceptive pill prescribed by pharmacies.
- Levonogestrel (brand name: Levonelle) taken within three days of unprotected sex
- Ulipristal Acetate (brand name: ellaOne) taken within five days of unprotected sex
This report explores
Prescribing patterns of emergency contraceptive pill (ECP) between 2019 to 2023.
How ECP and antibiotics for STI prescriptions differ in geography comparing university town to a non-university town
Compare patterns in ECP and STI antibiotics prescribing during Covid-19 lockdowns - did the need for ECP or incidence of STI antibiotics prescription go down during lockdowns?
How does SIMD relate to prescription of ECP / antibiotics for STIs
# load necessary libraries
library(tidyverse)
library(janitor)
library(gt)
library(here)
library(lubridate)
library(patchwork)
library(plotly)
I have chosen to look at all prescriptions from 2020 and 2023. To do this I downloaded the prescriptions data for each month from: https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community
I wanted to explore how many ECP and antibiotics for STIs were prescibed during across 2023.
4 plots and / or tables No use of settings other than default Labelled, titled plots Labelled titled tables Appropriate and thoughtful data visualisation with some use of non-default settings (gt) Faceting, multilayered plots, interactive visualisation
Read in data:
# Read in the Health Board names (HB_names). Available from https://www.opendata.nhs.scot/dataset/geography-codes-and-labels/resource/652ff726-e676-4a20-abda-435b98dd7bdc
HB_names <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/652ff726-e676-4a20-abda-435b98dd7bdc/download/hb14_hb19.csv") %>%
clean_names() # ensures column names are unique, lower case and replaces spaces and special characters with underscores. I will use this function for all read ins to ensure my data is consistent and easy to manipulate.
# Read in population data per Health Board from the 2022 census data. Available from: https://statistics.ukdataservice.ac.uk/dataset/scotland-s-census-2022-uv102a-age-by-sex/resource/b2d295c2-af53-4b3d-a075-7815cadd9060
population_data <- read_csv(here("data", "UV103_age_health_board_census.csv"), skip = 10) %>% # locates the csv and excludes the first 10 lines in the csv as they are redundant
rename(Spare = "...6", # remove unused columns
hb_name = "Health Board Area 2019",
hb_population = Count) %>% # rename() formats the data to match the prescriptions dataframe
filter(Age == "All people" & Sex == "Female") %>% # filter female population, as men do not take ECP
select(hb_name, hb_population) %>% # select columns of interest
mutate(hb_name = paste("NHS", hb_name)) %>% # change hb_name column format to match the Health Board dataframe format
clean_names()
Define a function to read in the prescription data for a defined year:
# #For efficiency I created a function to read in my prescriptions datasets.
# #I downloaded 12 months of prescription data for each year from 2019 to 2022. I placed the 12 months of data into their relevant folder named all_months_year, where year was specific to the data it contained.
# read_all_prescriptions <- function(year){
# all_files <- list.files(here("data", paste0("all_months_", year)), pattern = "csv") #list.files() retrieves files from the relevant directory, and the paste0() dynamically constructs the folder name based on the year variable placed into the function.
# all_prescriptions <- all_files %>%
# map_dfr(~read_csv(here("data", paste0("all_months_", year),.))) %>% #map_dfr() row-binds the datasets being red in
# clean_names() %>%
# drop_na(bnf_item_description) # drop the rows with missing bnf_item values
# return(all_prescriptions)
# }
Read in datasets using read_all_prescriptions() function and start tidying:
# all_prescriptions_2019 <- read_all_prescriptions(2019)
# all_prescriptions_2020 <- read_all_prescriptions(2020)
# all_prescriptions_2021 <- read_all_prescriptions(2021)
# all_prescriptions_2022 <- read_all_prescriptions(2022)
#
# # 2019 dataset has hbt2014 as a column name instead of hbt. Rename to make column name consistent
# all_prescriptions_2019 <- all_prescriptions_2019 %>%
# rename(hbt = "hbt2014")
#
# # combine 4 years of data into one dataset to make it easier to wrangle
# combined_prescriptions <- bind_rows(
# all_prescriptions_2019,
# all_prescriptions_2020,
# all_prescriptions_2021,
# all_prescriptions_2022 )%>% # I mutate early and select the prescriptions of interest to prevent a very large dataset from being stored in my environment (prevents R slowing down and crashing)
# mutate(
# date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
# drug_simple = case_when(
# str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
# str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
# TRUE ~ "Other")) %>% # used case_when() to group different dosages of the same drug
# filter(drug_simple != "Other",!is.na(date)) %>% #remove prescriptions of no interest and any missing date values
# filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
# filter(!is.na(hbt))
# # save combined_prescriptions dataset to csv
# # write_csv(combined_prescriptions,"data/combined_prescriptions.csv")
combined_prescriptions <- read_csv(here("data","combined_prescriptions.csv")) %>%
mutate(
date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
drug_simple = case_when(
str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
TRUE ~ "Other")) %>% # used case_when() to group different dosages of the same drug
filter(drug_simple != "Other",!is.na(date))%>% # remove other prescriptions from the dataset and any missing date values
filter(hbt != "SB0806") %>% # filter out SB0806 as it is not a health board (Scottish Ambulance Service)
filter(!is.na(hbt))
Join and wrangle data:
# join the Prescriptions dataset to the Health Board names dataset and health board population data
ECP_scripts <- combined_prescriptions %>%
full_join(HB_names, by = c("hbt" = "hb")) %>% # Join with Health Board names
full_join(population_data, by = "hb_name") %>% # Join with population data
select(gp_practice, date, drug = drug_simple, hb_name, paid_quantity,hb_population) %>% # select columns of interest
mutate(month = factor(month(date), levels = 1:12, labels = month.abb), .after = date) %>% # extract month as a factor with labels to help when plotting
mutate(year = factor(year(date)), .after = month)
correct version:
# # select emergency contraceptive pill (ECP) to make dataframe smaller
# ECP_prescriptions <- combined_prescriptions %>%
# mutate(
# date = parse_date_time(paid_date_month, "ym"), # use lubridate to format date
# drug_simple = case_when(
# str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
# str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
# TRUE ~ "Other")) %>% # used case_when() to group different dosages of the same drug
# filter(drug_simple != "Other") %>% # remove other prescriptions from the dataset
# group_by(hbt, date, drug = drug_simple) %>%
# summarise(total_quantity = sum(paid_quantity, na.rm = TRUE),.groups = "drop") # sum prescriptions of ECP
#
# # full join HB_names and population_data to all_prescriptions dataset
# ECP_pop <- ECP_prescriptions %>%
# select(hb_name,date, drug, total_quantity,hb_population) %>% # Select columns of interest
# drop_na(total_quantity) %>% # Drop rows with missing values
# clean_names() %>%
# ungroup()
#
# ECP_table_data <- ECP_scripts %>%
# group_by(hb_name, drug, hb_population) %>%
# summarise(total_quantity = sum(paid_quantity, na.rm = TRUE),.groups = "drop") %>% # sum prescriptions of ECP
# ungroup()
#
#
# # calculate rate of prescriptions per 10,000 women
# ECP_pop_rate <- ECP_pop %>%
# mutate(prescriptions_per_10000 = (total_quantity / hb_population)*10000) %>%
# mutate(month = factor(month(date), levels = 1:12, labels = month.abb), .after = date) %>% # extract month as a factor with labels to help when plotting
# mutate(year = factor(year(date)), .after = month) %>% # extract year as a factor with labels to help when plotting
# ungroup()
This graph explores seasonal trends in prescribing of the two most commonly prescribed emergency contraceptive pills, Levonogestrel and Ulipristal Acetate, between 2019 to 2022 by Health Board. I plotted this to see if events such as Valentines day, University term start dates, or national holidays effect emergency contraceptive prescribing rates. I included 4 years of consecutive data to see if a one-time event, such as the Common Wealth Games caused any spikes in emergency contraception prescribing. Additionally by plotting the 4 consecutive years we can see if Covid-19 lockdowns in the years 2020 and 2021 affected emergency contraception prescribing rates.
Plot: Population-Weighted Rates: Ensures that national trends are calculated based on the total population across all health boards, avoiding bias from smaller regions. Accurate National-Level Insights: Better reflects overall prescribing behaviour at the national level, aligning with your goal to analyse seasonal and national events. prescribing rates are weighted by the total population, providing an accurate reflection of overall prescribing behaviour across Scotland.
total_population <- sum(population_data$hb_population, na.rm=TRUE)
ECP_presc_sum_month <- ECP_scripts %>%
group_by(date, drug, month, year) %>%
summarise(total_quantity_month = sum(paid_quantity, na.rm = TRUE)) %>%
ungroup() %>%
drop_na(drug) %>%
clean_names()
ECP_plot_data_v2 <- ECP_presc_sum_month %>%
group_by(drug, year, month) %>%
summarise(prescriptions_per_100000 = (total_quantity_month / total_population)*100000) #%>% #### 100,000
# labels / levels ? mutate( year = factor list c() #########
ECP_plot <- ECP_plot_data_v2 %>%
ggplot(aes(x = month, y = prescriptions_per_100000, group = year)) +
# Layer for other years
geom_line(
data = . %>% filter(year != 2020),
aes(linetype = as.factor(year)),
color = "grey70", # Grey for other years
size = 0.7
) +
# Layer for 2020
geom_line(
data = . %>% filter(year == 2020),
aes(linetype = as.factor(year)),
color = "red", # Highlight 2020 in red
size = 0.9
) +
facet_wrap(~drug, scales = "free_y") +
scale_linetype_manual(
values = c("2019" = "dotdash", "2020" = "solid", "2021" = "dashed", "2022" = "dotted"),
name = "Year") +
labs(
title = ("Seasonal Trends in Emergency Contraception Prescribing (2019–2022): \n Accounting for Population-Adjusted National Prescribing Rates"),
subtitle = "A four-year analysis of prescribing rates per 100,000 women, \n highlighting the impact of the Covid-19 lockdowns (March–July 2020, shown in red),\n with rates adjusted for population to ensure national-level accuracy",
x = "Month",
y = "Prescriptions per 100,000 women",
#color = "Year Colour"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 12, hjust = 0.5,face = "bold"),
plot.subtitle = element_text(size = 10, hjust= 0.5),
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10),
axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
axis.text.y = element_text(size = 8),
strip.text = element_text(size = 10),
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
legend.position = "bottom"
)
ECP_plot
ECP_plot_interactive <- ggplotly(ECP_plot)
ECP_plot_interactive
investigate what happened in 2019 - change in prescription practises over time ? did they used to use it for something else?
Trends to note:
It would be interesting to explore if different Health Boards prescribe different amounts of ECP.
play around:
ECP_table_data <- ECP_scripts %>%
group_by(hb_name, drug) %>% # Aggregate data by Health Board and drug
summarise(
total_quantity_4_years = sum(paid_quantity, na.rm = TRUE), # Total prescriptions over 4 years
avg_annual_total_quantity = total_quantity_4_years / 4, # Average annual prescriptions
hb_population = first(hb_population), # hb_population is consistent within each hb_name
.groups = "drop" # ungroup data
) %>%
drop_na(drug) %>% # remove rows with no drug values
mutate(avg_annual_presc_100000 = avg_annual_total_quantity * 100000 / hb_population) %>% #annual rate of prescriptions per 100,000
select(hb_name, drug, avg_annual_presc_100000) %>% # select columns before pivot
pivot_wider(
names_from = drug,
values_from = avg_annual_presc_100000,
values_fill = 0, # Fill missing values with 0
names_glue = "{drug}_rate" # Rename columns for clarity
) %>%
clean_names() # clean drug_rate column names following pivot
# calculate total prescription rate per health board
ECP_table_data <- ECP_table_data %>%
mutate(total_ECP_rate = rowSums(select(., levonorgestrel_rate, ulipristal_acetate_rate), na.rm = TRUE)) %>% # calculate total prescription rate for both drugs
arrange(desc(total_ECP_rate)) # Arrange by total rate in descending order
annual_avg_ECP_table <- ECP_table_data %>%
select(hb_name, levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate) %>% # Select relevant columns
gt() %>%
cols_label(hb_name = "Health Board",
total_ECP_rate= "Total",
levonorgestrel_rate= " Levonorgestrel",
ulipristal_acetate_rate=" Ulipristal Acetate") %>% # Rename the column
# Format the columns with numbers to have two decimal places
fmt_number(columns = c(levonorgestrel_rate, ulipristal_acetate_rate, total_ECP_rate), decimals = 2) %>%
# Centre the number columns using American spelling
cols_align(align = "center",
columns = c(levonorgestrel_rate,ulipristal_acetate_rate)) %>%
# Add an overall summary of the number columns
grand_summary_rows(columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate),
fns = list("Overall Average" = ~mean(., na.rm = TRUE)),
fmt = list(~ fmt_number(., decimals = 2))) %>%
# Add a title and subtitle; md() allows text formating from mark down
tab_header(title = md("**Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland**"),
subtitle = md("This table presents the average annual prescription rates of emergency contraception pills (ECP) per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order, starting with those with the highest prescription rates.")) %>%
# use tab_spanner to add a title to each drug and total column.
tab_spanner(
label = md("*Prescription rate per 100,000 women*"),
columns = c(levonorgestrel_rate,ulipristal_acetate_rate, total_ECP_rate)) %>%
tab_source_note(md("Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community) ")) %>%
tab_stubhead(md("**2019-2022**")) %>%
tab_footnote(
footnote = md("*includes Capital City, Edinburgh*"),
locations = cells_body(columns = hb_name, rows = 2)
) %>%
opt_stylize(style = 6, color = "cyan")
annual_avg_ECP_table
| Average Annual Rate of Emergency Contraception Prescriptions by Health Board in Scotland | ||||
| This table presents the average annual prescription rates of emergency contraception pills (ECP) per 100,000 women, derived from the mean prescription rates across the years 2019 to 2022. Health Boards are ranked in descending order, starting with those with the highest prescription rates. | ||||
| 2019-2022 | Health Board |
Prescription rate per 100,000 women
|
||
|---|---|---|---|---|
| Levonorgestrel | Ulipristal Acetate | Total | ||
| NHS Greater Glasgow and Clyde | 3,375.85 | 28.52 | 3,404.37 | |
| NHS Lothian1 | 2,494.97 | 56.37 | 2,551.34 | |
| NHS Ayrshire and Arran | 2,027.24 | 15.29 | 2,042.53 | |
| NHS Tayside | 1,364.97 | 22.08 | 1,387.05 | |
| NHS Forth Valley | 1,268.46 | 73.31 | 1,341.78 | |
| NHS Grampian | 1,210.23 | 15.88 | 1,226.10 | |
| NHS Fife | 1,049.15 | 53.58 | 1,102.73 | |
| NHS Lanarkshire | 783.56 | 47.97 | 831.54 | |
| NHS Shetland | 778.66 | 17.45 | 796.11 | |
| NHS Orkney | 609.26 | 6.72 | 615.98 | |
| NHS Dumfries and Galloway | 514.87 | 24.25 | 539.12 | |
| NHS Western Isles | 191.58 | 197.27 | 388.85 | |
| NHS Highland | 284.97 | 52.14 | 337.11 | |
| NHS Borders | 242.26 | 76.18 | 318.44 | |
| Overall Average | — | 1,156.86 | 49.07 | 1,205.93 |
| Data from Public Health Scotland. Available from: (https://www.opendata.nhs.scot/dataset/prescriptions-in-the-community) | ||||
| 1 includes Capital City, Edinburgh | ||||
How does deprivation influence type of contraception prescribed? I wanted to explore if there was any correlation between the type of contraception being prescribed and deprivation. To do this I made a ratio of levonogestrel to total emergency contraception prescriptions.
A/(A+B) Where A = Levonogestrel (prescribed within 3 days of unprotected sex) B = Ulipristal Acetate (prescribed within 5 days of unprotected sex)
This is important to identify any variation in prescribing patterns in more deprived areas. Differences may suggest patients access health services later, so have to use Ulipristal Acetate, or that GPs or Pharmacists have differences in prescribing preferences in more deprived areas.
To measure deprivation I have used the Scottish Index of Multiple Deprivation. I took the SIMD 2020v2 dataset from Public Health Scotland website [available from: https://www.opendata.nhs.scot/gl/dataset/scottish-index-of-multiple-deprivation/resource/acade396-8430-4b34-895a-b3e757fa346e ] as this dataset contains SIMD decile weighted per Health Board population.
When interpreting SIMD in deciles rank 1 is considered the most deprived, and rank 10 is least deprived.
# # read in datasets:
#
# SIMD <- read_csv("https://www.opendata.nhs.scot/gl/dataset/78d41fa9-1a62-4f7b-9edb-3e8522a93378/resource/acade396-8430-4b34-895a-b3e757fa346e/download/simd2020v2_22062020.csv") %>%
# clean_names() %>%
# select(data_zone, simd2020v2hb_decile)
#
# gp_addresses <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/9c1dccc7-7632-4b13-b451-092bd57973a4/download/practice_contactdetails_apr2023-open-data-1.csv") %>%
# clean_names() %>%
# select(practice_code, gp_practice_name, data_zone)
#
# data_zones <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/395476ab-0720-4740-be07-ff4467141352/download/dz2011_codes_and_labels_21042020.csv") %>%
# clean_names() %>%
# select(data_zone, data_zone_name)
#
# # Create ECP_prescriptions_GP_location by using the GP_addresses dataset to map the GP practice code to a datazone. I could then use datazone to full_join() the SIMD dataset to the prescriptions dataset.
#
#
# ECP_GP <- ECP_presc_sum_month %>%
# left_join(gp_addresses, by = c("gp_practice" = "practice_code")) %>%
# #left_join(data_zones, by = "data_zone") %>%
# left_join(SIMD, by = "data_zone") %>%
# ungroup() %>%
# drop_na() %>% # Drop rows with missing values
# clean_names()
#
# ######
# ECP_GP <- combined_prescriptions %>%
# mutate(
# date = parse_date_time(paid_date_month, "ym"),
# drug_simple = case_when(
# str_detect(bnf_item_description, "LEVONO") ~ "Levonorgestrel",
# str_detect(bnf_item_description, "ULIPR") ~ "Ulipristal Acetate",
# TRUE ~ "Other")) %>%
# filter(drug_simple != "Other") %>%
# group_by(gp_practice, date, drug = drug_simple) %>% # this time keeping in gp_practice
# summarise(total_quantity = sum(paid_quantity, na.rm = TRUE)) %>%
# full_join(HB_names, by = c("hbt" = "hb")) %>% # Join with Health Board names
# full_join(population_data, by = "hb_name") %>% # Join with population data
# mutate(prescriptions_per_10000 = (total_quantity / hb_population)*10000) %>%
# left_join(gp_addresses, by = c("gp_practice" = "practice_code")) %>%
# #left_join(data_zones, by = "data_zone") %>%
# left_join(SIMD, by = "data_zone") %>%
# ungroup() %>%
# drop_na() %>% # Drop rows with missing values
# clean_names()
# ######
#
# ECP_SIMD_barchart <- ECP_GP %>%
# group_by(simd2020v2hb_decile, drug) %>%
# summarise(prescriptions_per_10000 = (total_quantity / hb_population)*10000) %>% # summarise the prescriptions per 10,000
# ggplot(aes(x= prescriptions_per_10000, y = factor(simd2020v2hb_decile, levels = 1:10), fill = drug)) +
# geom_col() +
# theme(legend.position = "bottom")
#
# ECP_SIMD_barchart
Prescribing choices vary per region -
mean(prescriptions_per_10000, na.rm = TRUE)
Data sets to consider exploring:
# Data sets to consider exploring:
#data_zones_2011 <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/395476ab-0720-4740-be07-ff4467141352/download/dz2011_codes_and_labels_21042020.csv") %>%
#clean_names()
#SIMD_2020 <- read_csv(here("data", "scottish-index-of-multiple-deprivation.csv")) %>%
#clean_names()
#gp_addresses_2020 <- read_csv("https://www.opendata.nhs.scot/dataset/f23655c3-6e23-4103-a511-a80d998adb90/resource/42391720-7dcb-48a2-8070-b9d63b246ac6/download/practice_contactdetails_oct2020-open-data.csv") %>%
#clean_names() %>%
#select(practice_code, gp_practice_name, data_zone)
#data_zones <- read_csv("https://www.opendata.nhs.scot/dataset/9f942fdb-e59e-44f5-b534-d6e17229cc7b/resource/395476ab-0720-4740-be07-ff4467141352/download/dz2011_codes_and_labels_21042020.csv") %>%
#clean_names() %>%
#select(data_zone, data_zone_name)
antibiotics are prescribed for pneumonia / other infections. Not specific to STIs. Maybe find what percentage of antibiotics for doxy / azithro are prescribed for STIs in scotland and use this as an assumption when completing further analysis?
I used https://gsverhoeven.github.io/post/zotero-rmarkdown-csl/ to set up citations. Chosen BMJ style as common.